This take home exercise aims to explain factors affecting resale prices of public housing in Singapore by building hedonic pricing models using appropriate GWR methods.
Housing price is affected by a variety of factors, including global factors such as the general economy of a country or property specific factors such as structural variables related to the property and locational variables related to the neighbourhood.
Hedonic pricing model is used to examine the effect of housing factors as on property price. However, this method fails to take into consideration that spatial autocorrelation and spatial heterogeneity exist in housing transaction data, which could lead to biased results (Anselin 1998). In view of this limitation, Geographical Weighted Regression (GWR) was introduced to calibrate hedonic price models for housing.
As such, we will be building hedonic pricing models to explain factors affecting the resale prices of public housing in Singapore using appropriate GWR methods.
The data sets used for this analysis include:
The packages used for this analysis include:
packages = c('olsrr', 'corrplot', 'ggpubr', 'sf', 'spdep', 'GWmodel', 'tmap', 'tidyverse', 'httr', 'units', 'matrixStats')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
st_read() of sf package is used to import the geospatial data, which is in shapefile format.
mrt_sf <- st_read(dsn="data/geospatial",
layer="MRTLRTStnPtt")
Reading layer `MRTLRTStnPtt' from data source
`C:\Users\user\Desktop\SMU\Y4S1\IS415\geniceee\IS415_blog\_posts\2021-11-05-take-home-exercise-3\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 185 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 6138.311 ymin: 27555.06 xmax: 45254.86 ymax: 47854.2
Projected CRS: SVY21
bus_sf <- st_read(dsn="data/geospatial",
layer="BusStop")
Reading layer `BusStop' from data source
`C:\Users\user\Desktop\SMU\Y4S1\IS415\geniceee\IS415_blog\_posts\2021-11-05-take-home-exercise-3\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 5156 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 4427.938 ymin: 26482.1 xmax: 48282.5 ymax: 52983.82
Projected CRS: SVY21
mpsz_sf <- st_read(dsn="data/geospatial",
layer="MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source
`C:\Users\user\Desktop\SMU\Y4S1\IS415\geniceee\IS415_blog\_posts\2021-11-05-take-home-exercise-3\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
From the output message, we can see that:
mrt_sf sf data frame.bus_sf sf data frame.mpsz_sf sf data frame.Before we visualise the geospatial data, we will need to conduct data preprocessing to ensure that there are no invalid geometries and missing values.
[1] 0
[1] 0
[1] 9
There are no invalid geometries in both mrt_sfand bus_sf data frames while the mpsz_sf data frame contains 9 invalid geometries. We will now proceed to remove the invalid geometries in the mpsz_sf data frame.
mpsz_sf <- st_make_valid(mpsz_sf)
# Check again for invalid geometries
length(which(st_is_valid(mpsz_sf) == FALSE))
[1] 0
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 1 feature and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 22616.75 ymin: 47793.68 xmax: 22616.75 ymax: 47793.68
Projected CRS: SVY21
BUS_STOP_N BUS_ROOF_N LOC_DESC geometry
264 47201 UNK <NA> POINT (22616.75 47793.68)
Simple feature collection with 0 features and 15 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND PLN_AREA_N
[7] PLN_AREA_C REGION_N REGION_C INC_CRC FMEL_UPD_D X_ADDR
[13] Y_ADDR SHAPE_Leng SHAPE_Area geometry
<0 rows> (or 0-length row.names)
We can see that there is 1 observation with missing values in the bus_sf data frame. We will remove it because another bus stop of the same BUS_ROOF_N is found after further data exploration.
bus_sf <- bus_sf[!rowSums(is.na(bus_sf))!=0,]
# Check again for missing values
bus_sf[rowSums(is.na(bus_sf))!=0,]
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] BUS_STOP_N BUS_ROOF_N LOC_DESC geometry
<0 rows> (or 0-length row.names)
mrt_sf[duplicated(mrt_sf$STN_NAME),]
Simple feature collection with 19 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 22949.03 ymin: 28735.78 xmax: 42362.71 ymax: 46418.56
Projected CRS: SVY21
First 10 features:
OBJECTID STN_NAME STN_NO geometry
33 33 ANG MO KIO MRT STATION NS16 POINT (29807.27 39105.77)
97 105 MACPHERSON MRT STATION DT26 POINT (34235.8 34256.43)
103 111 BUGIS MRT STATION EW12 POINT (30491.56 31424.35)
114 122 EXPO MRT STATION DT35 POINT (42362.71 35285.67)
116 124 BUONA VISTA MRT STATION CC22 POINT (23251.76 32090.77)
121 129 MARINA BAY MRT STATION CE2 POINT (30423.43 28735.78)
124 132 CHINATOWN MRT STATION DT19 POINT (29238.97 29686.54)
134 142 TAMPINES MRT STATION DT32 POINT (40213.03 37463.37)
140 148 SERANGOON MRT STATION NE12 POINT (32480.07 36869.39)
144 152 PAYA LEBAR MRT STATION CC9 POINT (34550.58 33300.34)
bus_sf[duplicated(bus_sf$BUS_STOP_N),]
Simple feature collection with 13 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 13488.02 ymin: 29969.22 xmax: 44144.57 ymax: 47806.75
Projected CRS: SVY21
First 10 features:
BUS_STOP_N BUS_ROOF_N LOC_DESC
350 58031 UNK OPP CANBERRA DR
1569 51071 B21 MACRITCHIE RESERVOIR
2208 82221 B01 Blk 3A
2215 97079 B14 OPP ST. JOHN'S CRES
2392 62251 B03 BEF BLK 471B
2462 22501 B02 BLK 662A
2736 16119 NIL TELETECH PARK
2976 58229 B01A BLK 358A CP
3442 67421 NIL CHENG LIM STN EXIT B
3521 68091 B08 AFT BAKER ST
geometry
350 POINT (27089.69 47570.9)
1569 POINT (28305.37 36036.67)
2208 POINT (35308.74 33335.17)
2215 POINT (44144.57 38980.25)
2392 POINT (35500.36 39943.34)
2462 POINT (13488.02 35537.88)
2736 POINT (22318.89 29969.22)
2976 POINT (26094.86 47806.75)
3442 POINT (34741.77 42004.21)
3521 POINT (32038.84 43298.68)
mpsz_sf[duplicated(mpsz_sf$SUBZONE_C),]
Simple feature collection with 0 features and 15 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND PLN_AREA_N
[7] PLN_AREA_C REGION_N REGION_C INC_CRC FMEL_UPD_D X_ADDR
[13] Y_ADDR SHAPE_Leng SHAPE_Area geometry
<0 rows> (or 0-length row.names)
We can observe that there are 19 and 13 duplicated observations in the mrt_sf and bus_sf data frames respectively, while there are none in the mpsz_sf data frame. We will proceed to remove the duplicated observations.
mrt_sf <- mrt_sf[!duplicated(mrt_sf$STN_NAME),]
bus_sf <- bus_sf[!duplicated(bus_sf$BUS_STOP_N),]
# Check again for duplicates
mrt_sf[duplicated(mrt_sf$STN_NAME),]
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO geometry
<0 rows> (or 0-length row.names)
bus_sf[duplicated(bus_sf$BUS_STOP_N),]
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] BUS_STOP_N BUS_ROOF_N LOC_DESC geometry
<0 rows> (or 0-length row.names)
We will first need to retrieve the coordinate reference system for verification. st_crs() of sf package is used to do this.
st_crs(mrt_sf)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
st_crs(bus_sf)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
st_crs(mpsz_sf)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
From the output messages, we can observe that the EPSG code for the data frames is currently 9001. This is wrong because the EPSG code of projection coordinate system SVY21 is supposed to be 3414, instead of 9001.
st_set_crs() of sf package is therefore used to assign the correct EPSG code for the data frames.
mrt_sf <- st_set_crs(mrt_sf, 3414)
bus_sf <- st_set_crs(bus_sf, 3414)
mpsz_sf <- st_set_crs(mpsz_sf, 3414)
We will now use st_crs() of sf package to retrieve the coordinate reference system again.
st_crs(mrt_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(bus_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(mpsz_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
The EPSG code is now correctly assigned for all sf data frames!!
It is useful to plot a map to visualise the geospatial data, so that we can easily get a preliminary look at the spatial patterns.
tmap_mode('view')
tm_shape(mrt_sf) +
tm_dots(col="red") +
tm_shape(bus_sf) +
tm_dots(col="blue")
If we look closely, we can see that there are 5 bus stops outside of Singapore’s boundary (46211, 46219, 46239, 46609, 47701). As we are able to travel to and fro Johor Bahru with specific buses, there are designated bus stops located at Johor Bahru.
As such, we should remove these bus stops before proceeding with our analysis.
In this section, we will proceed to remove the bus stops identified earlier.
omit_bus <- list(46211, 46219, 46239, 46609, 47701)
bus_sf[bus_sf$BUS_STOP_N %in% omit_bus,]
Simple feature collection with 5 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 17969.14 ymin: 49173.42 xmax: 20723.24 ymax: 52983.82
Projected CRS: SVY21 / Singapore TM
BUS_STOP_N BUS_ROOF_N LOC_DESC
765 47701 NIL JB SENTRAL
1532 46239 NA LARKIN TER
2257 46609 NA KOTARAYA II TER
2269 46211 NIL JOHOR BAHRU CHECKPT
4346 46219 NIL JOHOR BAHRU CHECKPT
geometry
765 POINT (20370.54 49173.42)
1532 POINT (17969.14 52983.82)
2257 POINT (20025.46 49261.6)
2269 POINT (20623.18 49199.99)
4346 POINT (20723.24 49200.05)
We can observe that the location descriptions of these bus stops indeed indicate that they are situated in Johor Bahru. We will therefore need to remove them.
bus_sf <- bus_sf[!bus_sf$BUS_STOP_N %in% omit_bus,]
# Check again if we have removed successfully
bus_sf[bus_sf$BUS_STOP_N %in% omit_bus,]
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21 / Singapore TM
[1] BUS_STOP_N BUS_ROOF_N LOC_DESC geometry
<0 rows> (or 0-length row.names)
We will be making use of the package onemapsgapi to query the required data sets from OneMap API. We will then save these data sets in CSV format.
read_csv() of readr package is used to import the CSV files. The outputs are tibble data frames.
eldercare = read_csv("data/aspatial/eldercare.csv")
hawker = read_csv("data/aspatial/hawkercentre.csv")
park = read_csv("data/aspatial/relaxsg.csv")
supermarket = read_csv("data/aspatial/supermarkets.csv")
kindergarten = read_csv("data/aspatial/kindergartens.csv")
childcare = read_csv("data/aspatial/childcare.csv")
school = read_csv("data/aspatial/schools.csv")
mall = read_csv("data/aspatial/shopping-mall.csv")
flat_resale = read_csv("data/aspatial/resale-flat-prices.csv")
It is also important to understand the data that we are working with. glimpse() of dplyr package is therefore used to perform exploratory data analysis.
glimpse(eldercare)
glimpse(hawker)
glimpse(park)
glimpse(supermarket)
glimpse(kindergarten)
glimpse(childcare)
glimpse(school)
glimpse(mall)
glimpse(flat_resale)
We will proceed to prepare the data such that they can be used later for the preparation of independent variables. The steps taken include:
# remove redundant columns for eldercare dataset
eldercare <- eldercare %>%
select(c(1, 4:5))
# remove redundant columns for hawker dataset
hawker <- hawker %>%
select(c(1, 14:15))
# remove redundant columns for park dataset
park <- park %>%
select(c(1, 8:9))
# remove redundant columns for supermarket dataset
supermarket <- supermarket %>%
select(c(1, 7:8))
# remove redundant columns for kindergarten dataset
kindergarten <- kindergarten %>%
select(c(1, 5:6))
# remove redundant columns for childcare dataset
childcare <- childcare %>%
select(c(1, 5:6))
# filter out primary schools; remove redundant columns
prischool <- school %>%
filter(mainlevel_code == "PRIMARY") %>%
select(c(1, 3:4, 25,27))
# filter out 4-room flats with transaction period from 01-Jan-19 to 30-Sep-20
flat_resale <- flat_resale %>%
filter(flat_type == "4 ROOM") %>%
filter(month >= "2019-01" & month < "2020-10")
We will need to conduct data preprocessing to ensure that there are no NA values in all data frames.
eldercare[rowSums(is.na(eldercare))!=0,]
hawker[rowSums(is.na(hawker))!=0,]
park[rowSums(is.na(park))!=0,]
supermarket[rowSums(is.na(supermarket))!=0,]
kindergarten[rowSums(is.na(kindergarten))!=0,]
childcare[rowSums(is.na(childcare))!=0,]
prischool[rowSums(is.na(prischool))!=0,]
mall[rowSums(is.na(mall))!=0,]
flat_resale[rowSums(is.na(flat_resale))!=0,]
There are no NA values in all the data frames.
After exploratory data analysis performed earlier, it is found that prischool and flat_resale data frames do not have Lat and Lng columns. These columns are required for the preparation of the independent variables later. We therefore need to use OneMap API to geocode the coordinates columns.
We first need to create an address column for the flat_resale data frame since the address data is currently split into 2 columns block and street name respectively.
unite() of dpylr package is used to concatenate the 2 columns.
flat_resale <- flat_resale %>%
unite("address", block:street_name, sep= " ")
We alo observe that some addresses in the the flat_resale data frame start with “ST.”. It is found later that this will pose problems during geocoding. As a result, we will need to rename “ST.” to its full version “SAINT”.
flat_resale$address <- gsub("ST\\.", "SAINT", flat_resale$address)
In this function, the input variable address is passed in as a search value in the query to the API. The output is then converted to a data frame, where we only choose to retain the Latitude and Longitude columns. These 2 columns are returned as the output.
geocode <- function(address) {
url <- "https://developers.onemap.sg/commonapi/search"
query <- list("searchVal" = address, "returnGeom" = "Y",
"getAddrDetails" = "N", "pageNum" = "1")
res <- GET(url, query = query, verbose())
output <- content(res) %>%
as.data.frame %>%
select(results.LATITUDE, results.LONGITUDE)
return(output)
}
We will now loop through each row of the prischool data frame and implement the geocode function. The output is saved as 2 new columns Lat and Lng.
prischool$Lat <- 0
prischool$Lng <- 0
for (i in 1:nrow(prischool)) {
output <- geocode(prischool$postal_code[i])
prischool$Lat[i] <- output$results.LATITUDE
prischool$Lng[i] <- output$results.LONGITUDE
}
We will now loop through each row of the flat_resale data frame and implement the geocode function. The output is saved as 2 new columns Lat and Lng.
flat_resale$Lat <- 0
flat_resale$Lng <- 0
for (i in 1:nrow(flat_resale)) {
output <- geocode(flat_resale$address[i])
flat_resale$Lat[i] <- output$results.LATITUDE
flat_resale$Lng[i] <- output$results.LONGITUDE
}
We need to conduct dummy coding on the storey_range variable for us to use it in model later. We will first look at the individual storey-range values for us to get a rough idea on the number of columns that will be produced.
unique(flat_resale$storey_range)
We observed that there are 17 storey range categories.
To conduct dummy coding, we will be using pivot_wider() of dplyr package to create duplicate variables representing every store-range. If the obeservation belongs to the storey-range, the value will be 1. The value will be 0 otherwise.
flat_resale <- flat_resale %>%
pivot_wider(names_from = storey_range, values_from = storey_range,
values_fn = list(storey_range = ~1), values_fill = 0)
We need to convert the remaining_lease column from string to numeric format to use in the model later. We will first split the string within the column and take the value(s) of the year and month. We will then calculate the remaining lease in years and replace the original value.
str_list <- str_split(flat_resale$remaining_lease, " ")
for (i in 1:length(str_list)) {
if (length(unlist(str_list[i])) > 2) {
year <- as.numeric(unlist(str_list[i])[1])
month <- as.numeric(unlist(str_list[i])[3])
flat_resale$remaining_lease[i] <- year + round(month/12, 2)
}
else {
year <- as.numeric(unlist(str_list[i])[1])
flat_resale$remaining_lease[i] <- year
}
}
We need to filter out good primary schools and save it as variable goodprischool for it to be used in the model. Good primary schools are defined to be Special Assistance Programme (SAP) primary schools or primary schools with the Gifted Education Programme (GEP).
gdprischool <- prischool %>%
filter(sap_ind == "Yes" | gifted_ind == "Yes")
The Central Business District (CBD) is in Downtown Core, located in the southwest part of Singapore. We will therefore take the coordinates of Downtown Core to be the coordinates of the CBD and convert it into a sf data frame.
lat <- 1.287953
long <- 103.851784
cbd_sf <- data.frame(lat, long) %>%
st_as_sf(coords = c("long", "lat"), crs=4326) %>%
st_transform(crs=3414)
We need to convert the data sets into sf data frames for us to calculate the proximity distance matrices.
flat_resale_sf <- st_as_sf(flat_resale,
coords = c("Lng", "Lat"), crs=4326) %>%
st_transform(crs = 3414)
eldercare_sf <- st_as_sf(eldercare,
coords = c("Lng", "Lat"), crs=4326) %>%
st_transform(crs = 3414)
hawker_sf <- st_as_sf(hawker,
coords = c("Lng", "Lat"), crs=4326) %>%
st_transform(crs = 3414)
park_sf <- st_as_sf(park,
coords = c("Lng", "Lat"), crs=4326) %>%
st_transform(crs = 3414)
gdprischool_sf <- st_as_sf(gdprischool,
coords = c("Lng", "Lat"), crs=4326) %>%
st_transform(crs = 3414)
mall_sf <- st_as_sf(mall,
coords = c("longitude", "latitude"), crs=4326) %>%
st_transform(crs = 3414)
supermarket_sf <- st_as_sf(supermarket,
coords = c("Lng", "Lat"), crs=4326) %>%
st_transform(crs = 3414)
kindergarten_sf <- st_as_sf(kindergarten,
coords = c("Lng", "Lat"), crs=4326) %>%
st_transform(crs = 3414)
childcare_sf <- st_as_sf(childcare,
coords = c("Lng", "Lat"), crs=4326) %>%
st_transform(crs = 3414)
prischool_sf <- st_as_sf(prischool,
coords = c("Lng", "Lat"), crs=4326) %>%
st_transform(crs = 3414)
This function computes the distance to the nearest facility. st_distance() of sf package is used to compute the distance between all flats and the respective facilities. rowMins() of matrixStats is then used to take the shortest distance within each row in the output matrix. The values will be appended to the data frame as a new column.
proximity <- function(df1, df2, varname) {
matrix <- st_distance(df1, df2) %>%
drop_units()
df1[,varname] <- rowMins(matrix)
return(df1)
}
We will now implement the proximity function to the required variables.
flat_resale_sf <- proximity(flat_resale_sf, cbd_sf, "PROX_CBD") %>%
proximity(., eldercare_sf, "PROX_EC") %>%
proximity(., hawker_sf, "PROX_HA") %>%
proximity(., mrt_sf, "PROX_MRT") %>%
proximity(., park_sf, "PROX_PARK") %>%
proximity(., gdprischool_sf, "PROX_GDPRI") %>%
proximity(., mall_sf, "PROX_MALL") %>%
proximity(., supermarket_sf, "PROX_SUP")
This function computes the facility count within a radius. Similarly, st_distance() of sf package is used to compute the distance between all flats and the respective facilities. rowSums() of dplyr is then used to count the observations with distances below the defined radius. The values will be appended to the data frame as a new column.
num_radius <- function(df1, df2, varname, radius) {
matrix <- st_distance(df1, df2) %>%
drop_units() %>%
as.data.frame()
df1[,varname] <- rowSums(matrix <= radius)
return(df1)
}
We will now implement the num_radius function to the required variables.
flat_resale_sf <- num_radius(flat_resale_sf, kindergarten_sf,
"NUM_KIN", 350) %>%
num_radius(., childcare_sf, "NUM_CC", 350) %>%
num_radius(., bus_sf, "NUM_STOP", 350) %>%
num_radius(., prischool_sf, "NUM_PRI", 1000)
Before saving the dataset, we will remove the redundant columns, rename the columns to shorter forms and relocate the price columns to the front of the data frame.
flat_resale_sf <- flat_resale_sf %>%
select(5, 8, 9, 10:39) %>%
rename("AREA_SQM" = "floor_area_sqm", "LEASE_YRS" = "remaining_lease",
"PRICE"= "resale_price") %>%
relocate(`PRICE`)
We can now save the final flat resale price data set as a SHP file using st_write of sf package.
st_write(flat_resale_sf, "data/geospatial/resale-flat-prices-final.shp")
st_read() of sf package is used to import the final data set, which is in shapefile format.
flat_resale_sf <- st_read(dsn="data/geospatial",
layer="resale-flat-prices-final")
Reading layer `resale-flat-prices-final' from data source
`C:\Users\user\Desktop\SMU\Y4S1\IS415\geniceee\IS415_blog\_posts\2021-11-05-take-home-exercise-3\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 15853 features and 32 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 11597.31 ymin: 28217.39 xmax: 42623.63 ymax: 48741.06
Projected CRS: SVY21 / Singapore TM
We can plot the distribution of PRICE using a histogram as shown below.
ggplot(data=flat_resale_sf, aes(x=`PRICE`)) +
geom_histogram(bins=20, color="black", fill="light blue")

The figure above reveals a slightly right skewed distribution. This means that more HDB flats were transacted at relative lower prices.
Statistically, the skewed distribution can be normalised using log transformation. mutate() of dplyr package is used to derive a new variable called LOG_PRICE using a log transformation on the variable PRICE.
flat_resale_sf <- flat_resale_sf %>%
mutate(`LOG_PRICE` = log(PRICE)) %>%
relocate(`LOG_PRICE`, .after = `PRICE`)
We can now plot the distribution of LOG_PRICE.
ggplot(data=flat_resale_sf, aes(x=`LOG_PRICE`)) +
geom_histogram(bins=20, color="black", fill="light blue")

It is found that the LEASE_YRS column is still in string format. We will convert it to numeric format for us to conduct EDA and to input into the model later.
flat_resale_sf$LEASE_YRS <- as.numeric(flat_resale_sf$LEASE_YRS)
We will now draw multiple histograms to show multiple distributions of the independent variables. This is done using ggarrange() of ggpubr package.
AREA_SQM <- ggplot(data=flat_resale_sf, aes(x= `AREA_SQM`)) +
geom_histogram(bins=20, color="black", fill="light blue")
LEASE_YRS <- ggplot(data=flat_resale_sf, aes(x= `LEASE_YRS`)) +
geom_histogram(bins=20, color="black", fill="light blue")
PROX_CBD <- ggplot(data=flat_resale_sf, aes(x= `PROX_CBD`)) +
geom_histogram(bins=20, color="black", fill="light blue")
PROX_EC <- ggplot(data=flat_resale_sf, aes(x= `PROX_EC`)) +
geom_histogram(bins=20, color="black", fill="light blue")
PROX_HAWKER <- ggplot(data=flat_resale_sf, aes(x= `PROX_HA`)) +
geom_histogram(bins=20, color="black", fill="light blue")
PROX_MRT <- ggplot(data=flat_resale_sf, aes(x= `PROX_MRT`)) +
geom_histogram(bins=20, color="black", fill="light blue")
PROX_PARK <- ggplot(data=flat_resale_sf, aes(x= `PROX_PARK`)) +
geom_histogram(bins=20, color="black", fill="light blue")
PROX_GDPRI <- ggplot(data=flat_resale_sf, aes(x= `PROX_GDPRI`)) +
geom_histogram(bins=20, color="black", fill="light blue")
PROX_MALL <- ggplot(data=flat_resale_sf, aes(x= `PROX_MALL`)) +
geom_histogram(bins=20, color="black", fill="light blue")
PROX_SUP <- ggplot(data=flat_resale_sf, aes(x= `PROX_SUP`)) +
geom_histogram(bins=20, color="black", fill="light blue")
NUM_KIN <- ggplot(data=flat_resale_sf, aes(x= `NUM_KIN`)) +
geom_histogram(bins=20, color="black", fill="light blue")
NUM_CC <- ggplot(data=flat_resale_sf, aes(x= `NUM_CC`)) +
geom_histogram(bins=20, color="black", fill="light blue")
NUM_STOP <- ggplot(data=flat_resale_sf, aes(x= `NUM_STOP`)) +
geom_histogram(bins=20, color="black", fill="light blue")
NUM_PRI <- ggplot(data=flat_resale_sf, aes(x= `NUM_PRI`)) +
geom_histogram(bins=20, color="black", fill="light blue")
ggarrange(AREA_SQM, LEASE_YRS, PROX_CBD, PROX_EC, PROX_HAWKER, PROX_MRT, PROX_PARK, PROX_GDPRI, PROX_MALL, PROX_SUP, NUM_KIN, NUM_CC, NUM_STOP, NUM_PRI, ncol = 3, nrow = 5)

We can observe that the distribution of the majority of the independent variables are right skewed, while variables such as LEASE_YRS and PROX_CBD are slightly left skewed.
We want to reveal the geospatial distribution flat resale prices (i.e. dependent variable) in Singapore using the tmap package.
tmap_mode("view")
tm_shape(mpsz_sf)+
tm_polygons() +
tm_shape(flat_resale_sf) +
tm_dots(col = "PRICE",
alpha = 0.6,
style="quantile") +
tm_view(set.zoom.limits = c(11,14))
From the map, we can observe that flats with higher resale prices are more concentrated in the central area of Singapore.
We will be building the hedonic pricing model using both multiple linear regression method and appropriate GWR methods. The independent variables we are looking at include:
Before building a multiple regression model, it is important to ensure that the independent variables used are not highly correlated to each other, which will cause the quality of the model to be compromised.
Correlation matrix is commonly used to visualise the relationships between the independent variables. corrplot package will be used to plot a scatterplot matrix of the relationship between the independent variables.
flat_resale <- flat_resale_sf %>%
st_set_geometry(NULL)
corrplot(cor(flat_resale[, 3:33]), diag = FALSE, order = "AOE",
tl.pos = "td", tl.cex = 0.5, method = "number", type = "upper")

From the scatterplot matrix, it is clear that PROX_GDPRI is highly correlated to PROX_CBD. In view of this, it is wiser to only include either one of them in the subsequent model building. As a result, PROX_CBD is excluded in the subsequent model building.
lm() is used to calibrate the multiple linear regression model.
flat_mlr <- lm(formula = PRICE ~ AREA_SQM + LEASE_YRS + X01.TO.03 + X04.TO.06 + X07.TO.09 + X10.TO.12 + X13.TO.15 + X16.TO.18 + X19.TO.21 + X22.TO.24 + X25.TO.27 + X28.TO.30 + X31.TO.33 + X34.TO.36 + X37.TO.39 + X40.TO.42 + X43.TO.45 + X46.TO.48 + X49.TO.51 + PROX_EC + PROX_HA + PROX_MRT + PROX_GDPRI + PROX_MALL + PROX_SUP + NUM_KIN + NUM_CC + NUM_STOP + NUM_PRI, data=flat_resale_sf)
summary(flat_mlr)
Call:
lm(formula = PRICE ~ AREA_SQM + LEASE_YRS + X01.TO.03 + X04.TO.06 +
X07.TO.09 + X10.TO.12 + X13.TO.15 + X16.TO.18 + X19.TO.21 +
X22.TO.24 + X25.TO.27 + X28.TO.30 + X31.TO.33 + X34.TO.36 +
X37.TO.39 + X40.TO.42 + X43.TO.45 + X46.TO.48 + X49.TO.51 +
PROX_EC + PROX_HA + PROX_MRT + PROX_GDPRI + PROX_MALL + PROX_SUP +
NUM_KIN + NUM_CC + NUM_STOP + NUM_PRI, data = flat_resale_sf)
Residuals:
Min 1Q Median 3Q Max
-258828 -42665 -8125 35489 590059
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.041e+05 1.630e+04 18.652 < 2e-16 ***
AREA_SQM 1.867e+03 8.707e+01 21.443 < 2e-16 ***
LEASE_YRS 3.370e+03 5.329e+01 63.252 < 2e-16 ***
X01.TO.03 -8.243e+04 1.280e+04 -6.439 1.24e-10 ***
X04.TO.06 -6.182e+04 1.277e+04 -4.843 1.29e-06 ***
X07.TO.09 -4.653e+04 1.275e+04 -3.651 0.000262 ***
X10.TO.12 -3.759e+04 1.274e+04 -2.949 0.003190 **
X13.TO.15 -3.288e+04 1.277e+04 -2.575 0.010021 *
X16.TO.18 -1.780e+04 1.292e+04 -1.378 0.168321
X19.TO.21 1.227e+04 1.338e+04 0.917 0.359269
X22.TO.24 8.800e+03 1.352e+04 0.651 0.515212
X25.TO.27 7.018e+04 1.431e+04 4.905 9.46e-07 ***
X28.TO.30 1.634e+05 1.477e+04 11.062 < 2e-16 ***
X31.TO.33 1.656e+05 1.775e+04 9.331 < 2e-16 ***
X34.TO.36 1.690e+05 1.858e+04 9.093 < 2e-16 ***
X37.TO.39 2.475e+05 1.828e+04 13.541 < 2e-16 ***
X40.TO.42 2.874e+05 2.028e+04 14.168 < 2e-16 ***
X43.TO.45 3.800e+05 4.446e+04 8.546 < 2e-16 ***
X46.TO.48 4.281e+05 4.447e+04 9.626 < 2e-16 ***
X49.TO.51 4.938e+05 5.370e+04 9.196 < 2e-16 ***
PROX_EC -2.908e+01 9.360e-01 -31.067 < 2e-16 ***
PROX_HA -4.390e+01 1.203e+00 -36.484 < 2e-16 ***
PROX_MRT -5.085e+01 1.649e+00 -30.844 < 2e-16 ***
PROX_GDPRI -1.192e+01 2.221e-01 -53.682 < 2e-16 ***
PROX_MALL -3.838e+01 1.796e+00 -21.362 < 2e-16 ***
PROX_SUP -4.347e+01 3.992e+00 -10.889 < 2e-16 ***
NUM_KIN 8.923e+03 6.153e+02 14.502 < 2e-16 ***
NUM_CC -5.431e+03 3.406e+02 -15.944 < 2e-16 ***
NUM_STOP -1.937e+03 2.144e+02 -9.032 < 2e-16 ***
NUM_PRI -1.970e+04 4.545e+02 -43.347 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 73720 on 15823 degrees of freedom
Multiple R-squared: 0.6245, Adjusted R-squared: 0.6238
F-statistic: 907.6 on 29 and 15823 DF, p-value: < 2.2e-16
We can observe that not all the independent variables are statistically significant. We will therefore revise the model by removing those variables which are not statistically significant.
flat_mlr1 <- lm(formula = PRICE ~ AREA_SQM + LEASE_YRS + X01.TO.03 + X04.TO.06 + X07.TO.09 + X10.TO.12 + X13.TO.15 + X25.TO.27 + X28.TO.30 + X31.TO.33 + X34.TO.36 + X37.TO.39 + X40.TO.42 + X43.TO.45 + X46.TO.48 + X49.TO.51 + PROX_EC + PROX_HA + PROX_MRT + PROX_GDPRI + PROX_MALL + PROX_SUP + NUM_KIN + NUM_CC + NUM_STOP + NUM_PRI, data=flat_resale_sf)
ols_regress(flat_mlr1)
Model Summary
----------------------------------------------------------------------
R 0.790 RMSE 73819.218
R-Squared 0.623 Coef. Var 17.026
Adj. R-Squared 0.623 MSE 5449276926.815
Pred R-Squared 0.622 MAE 54122.493
----------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
ANOVA
--------------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------------------
Regression 1.427711e+14 26 5.491195e+12 1007.692 0.0000
Residual 8.624026e+13 15826 5449276926.815
Total 2.290113e+14 15852
--------------------------------------------------------------------------------
Parameter Estimates
-----------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
-----------------------------------------------------------------------------------------------------
(Intercept) 298078.468 10367.288 28.752 0.000 277757.402 318399.534
AREA_SQM 1873.646 87.177 0.111 21.492 0.000 1702.769 2044.523
LEASE_YRS 3374.871 53.307 0.360 63.311 0.000 3270.384 3479.358
X01.TO.03 -75555.522 2581.020 -0.239 -29.274 0.000 -80614.616 -70496.428
X04.TO.06 -54950.973 2472.752 -0.193 -22.223 0.000 -59797.848 -50104.099
X07.TO.09 -39704.471 2489.436 -0.134 -15.949 0.000 -44584.049 -34824.892
X10.TO.12 -30753.479 2519.726 -0.099 -12.205 0.000 -35692.430 -25814.529
X13.TO.15 -26081.397 2773.183 -0.065 -9.405 0.000 -31517.153 -20645.642
X25.TO.27 76572.366 6975.810 0.056 10.977 0.000 62898.983 90245.748
X28.TO.30 169785.063 8162.317 0.105 20.801 0.000 153785.992 185784.133
X31.TO.33 171884.037 12503.598 0.068 13.747 0.000 147375.561 196392.513
X34.TO.36 175258.867 13667.554 0.063 12.823 0.000 148468.904 202048.830
X37.TO.39 253764.291 13251.550 0.095 19.150 0.000 227789.744 279738.837
X40.TO.42 293472.998 15901.565 0.091 18.456 0.000 262304.120 324641.876
X43.TO.45 385882.408 42694.557 0.044 9.038 0.000 302196.213 469568.603
X46.TO.48 433981.085 42707.498 0.050 10.162 0.000 350269.524 517692.646
X49.TO.51 499701.208 52270.096 0.047 9.560 0.000 397245.866 602156.550
PROX_EC -29.148 0.937 -0.161 -31.099 0.000 -30.985 -27.311
PROX_HA -44.422 1.202 -0.191 -36.946 0.000 -46.778 -42.065
PROX_MRT -50.994 1.651 -0.165 -30.892 0.000 -54.229 -47.758
PROX_GDPRI -11.918 0.222 -0.293 -53.585 0.000 -12.354 -11.482
PROX_MALL -38.886 1.797 -0.124 -21.638 0.000 -42.409 -35.363
PROX_SUP -43.883 3.997 -0.057 -10.979 0.000 -51.718 -36.049
NUM_KIN 8892.413 616.123 0.075 14.433 0.000 7684.741 10100.085
NUM_CC -5499.543 340.874 -0.090 -16.134 0.000 -6167.694 -4831.391
NUM_STOP -1912.711 214.677 -0.046 -8.910 0.000 -2333.502 -1491.920
NUM_PRI -19930.768 453.625 -0.254 -43.937 0.000 -20819.924 -19041.612
-----------------------------------------------------------------------------------------------------
We need to fulfill the 4 following assumptions to perform regression modelling on spatial data:
ols_vif_tol() of olsrr package is used to test if there are signs of multi-collinearity.
ols_vif_tol(flat_mlr1)
Variables Tolerance VIF
1 AREA_SQM 0.8956868 1.116462
2 LEASE_YRS 0.7340376 1.362328
3 X01.TO.03 0.3569196 2.801751
4 X04.TO.06 0.3167722 3.156843
5 X07.TO.09 0.3371043 2.966441
6 X10.TO.12 0.3584952 2.789438
7 X13.TO.15 0.5009265 1.996301
8 X25.TO.27 0.9175446 1.089865
9 X28.TO.30 0.9346452 1.069925
10 X31.TO.33 0.9704066 1.030496
11 X34.TO.36 0.9742236 1.026458
12 X37.TO.39 0.9717020 1.029122
13 X40.TO.42 0.9809335 1.019437
14 X43.TO.45 0.9966778 1.003333
15 X46.TO.48 0.9960739 1.003942
16 X49.TO.51 0.9973718 1.002635
17 PROX_EC 0.8912374 1.122035
18 PROX_HA 0.8943974 1.118071
19 PROX_MRT 0.8300641 1.204726
20 PROX_GDPRI 0.7980468 1.253059
21 PROX_MALL 0.7244891 1.380283
22 PROX_SUP 0.8695809 1.149979
23 NUM_KIN 0.8852389 1.129639
24 NUM_CC 0.7608978 1.314237
25 NUM_STOP 0.8861773 1.128442
26 NUM_PRI 0.7106962 1.407071
Since the VIF of the independent variables are less than 10, we can safely conclude that there are no signs of multi-collinearity among the independent variables.
ols_plot_resid_fit() of olsrr package is used to perform the linearity assumption test.
ols_plot_resid_fit(flat_mlr1)

The figure above reveals that most of the data points are scattered around the 0 line, we can therefore safely conclude that the relationships between the dependent variable and independent variables are linear.
ols_plot_resid_hist() of olsrr package is used to perform the normality assumption test.
ols_plot_resid_hist(flat_mlr1)

The figure reveals that the residuals of the multiple linear regression model (i.e. flat_mlr1) resembles normal distribution.
The hedonic model that we are building uses geographically referenced attributes, hence it is also important for us to visualise the residuals of the hedonic pricing model.
In order to perform spatial autocorrelation test, we need to convert the flat_resale_sf simple feature object into a SpatialPointsDataFrame.
First, we will export the residuals of the hedonic pricing model and save it as a data frame.
mlr_output <- as.data.frame(flat_mlr1$residuals)
Next, we will join the newly created data frame with the flat_resale_sf object.
flat_resale_res_sf <- cbind(flat_resale_sf,
mlr_output) %>%
rename(`MLR_RES` = `flat_mlr1.residuals`)
Next, we will convert the flat_resale_res_sf simple feature object into a SpatialPointsDataFrame. This is required because the spdep package can only process sp conformed spatial data objects.
flat_resale_sp <- as_Spatial(flat_resale_res_sf)
flat_resale_sp
class : SpatialPointsDataFrame
features : 15853
extent : 11597.31, 42623.63, 28217.39, 48741.06 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
variables : 34
names : PRICE, LOG_PRICE, AREA_SQM, LEASE_YRS, X01.TO.03, X07.TO.09, X04.TO.06, X10.TO.12, X13.TO.15, X25.TO.27, X19.TO.21, X16.TO.18, X28.TO.30, X31.TO.33, X22.TO.24, ...
min values : 218000, 12.2922503417712, 74, 45.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
max values : 1186888, 13.9868453136219, 138, 97, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Next, we will use tmap package to visualise the distribution of the residuals.
tmap_mode("view")
tm_shape(mpsz_sf)+
tm_polygons(alpha = 0.4) +
tm_shape(flat_resale_res_sf) +
tm_dots(col = "MLR_RES",
alpha = 0.6,
style="quantile") +
tm_view(set.zoom.limits = c(11,14))
The figure above reveals that there are signs of spatial autocorrelation. To prove that our observation is true, the Moran’s I test will be performed.
First, we will compute the distance-based weight matrix by using dnearneigh() function of spdep.
nb <- dnearneigh(coordinates(flat_resale_sp), 0, 1500, longlat = FALSE)
summary(nb)
Neighbour list object:
Number of regions: 15853
Number of nonzero links: 10144322
Percentage nonzero weights: 4.036455
Average number of links: 639.8992
Link number distribution:
2 14 25 31 46 56 58 60 63 65 68 70 71 72
3 14 20 6 47 4 4 5 1 2 4 1 12 1
78 85 86 88 89 90 91 96 97 99 100 104 106 109
77 2 1 5 4 3 1 7 3 7 4 2 2 2
110 111 112 115 117 118 119 120 121 122 123 124 125 126
6 5 1 6 7 3 1 7 12 7 19 6 58 18
128 130 131 132 133 135 136 137 138 139 140 141 142 143
29 10 19 4 7 7 4 9 22 5 8 9 13 3
144 145 147 148 149 150 151 152 153 154 155 156 157 158
13 6 19 1 4 3 6 20 15 14 2 10 4 9
159 160 161 164 165 166 167 168 170 171 172 173 174 175
3 5 16 9 19 12 6 3 3 15 9 13 1 10
176 177 178 179 180 181 182 183 184 185 186 187 188 189
16 14 16 27 19 3 10 7 11 29 38 9 6 2
190 191 192 193 194 195 196 197 198 199 200 201 202 203
7 14 9 4 4 2 3 6 22 12 20 25 13 4
204 205 206 207 208 209 211 212 213 214 215 216 217 218
12 3 3 21 27 2 3 4 4 10 25 7 1 6
219 220 221 222 223 224 225 226 227 228 229 230 231 232
10 21 28 17 1 36 10 4 20 22 5 13 12 8
233 234 235 236 237 238 239 240 241 242 243 244 245 246
6 13 39 19 2 14 7 36 23 5 11 12 5 13
247 248 249 250 251 252 253 254 255 256 257 258 259 260
19 17 10 14 28 14 8 12 9 16 15 9 4 5
261 262 263 264 265 266 267 268 269 270 271 272 273 274
12 11 19 12 14 17 15 30 6 7 37 64 16 16
275 276 277 278 279 280 281 282 283 284 285 286 287 288
17 31 60 3 45 52 24 26 25 26 27 47 49 45
289 290 291 292 293 294 295 296 297 298 299 300 301 302
25 37 57 27 27 50 18 9 30 21 10 30 38 32
303 304 305 306 307 308 309 310 311 312 313 314 315 316
14 27 27 17 18 5 2 15 18 10 26 24 8 18
317 318 319 320 321 322 323 324 325 326 327 328 329 330
44 12 23 40 18 15 30 38 23 28 29 28 18 39
331 332 333 334 335 336 337 338 339 340 341 342 343 344
5 18 16 14 27 7 62 25 21 31 29 6 33 14
345 346 347 348 349 350 351 352 353 354 355 356 357 358
12 11 31 42 29 13 14 42 18 17 31 14 25 23
359 360 361 362 363 364 365 366 367 368 369 370 371 372
31 21 13 30 31 7 28 18 39 16 19 32 22 10
373 374 375 376 377 378 379 380 381 382 383 384 385 386
45 23 10 61 9 24 13 17 12 18 11 20 29 17
387 388 389 390 391 392 393 394 395 396 397 398 399 400
47 38 28 22 5 30 5 8 33 20 24 16 28 18
401 402 403 404 405 406 407 408 409 410 411 412 413 414
22 20 21 24 16 4 14 24 19 14 22 33 5 22
415 416 417 418 419 420 421 422 423 424 425 426 427 428
22 8 12 76 8 20 22 49 41 18 13 33 28 23
429 430 431 432 433 434 435 436 437 438 439 440 441 442
20 25 11 16 40 19 32 29 19 22 26 13 19 9
443 444 445 446 447 448 449 450 451 452 453 454 455 456
6 6 26 10 19 33 15 29 9 32 18 11 25 13
457 458 459 460 461 462 463 464 465 466 467 468 469 470
72 23 15 10 14 12 9 38 11 5 24 23 3 35
471 472 473 474 475 476 477 478 479 480 481 482 483 484
13 21 14 14 21 15 24 23 27 26 17 4 12 18
485 486 487 488 489 490 491 492 493 494 495 496 497 498
14 15 9 23 25 17 10 7 27 16 3 9 20 1
499 500 501 502 503 504 505 506 507 508 509 510 511 512
5 10 10 18 19 8 36 27 17 16 19 39 12 29
513 514 515 516 517 518 519 520 521 522 523 524 525 526
12 14 16 8 4 13 2 4 16 7 13 10 12 11
527 528 529 530 531 532 533 534 535 536 537 538 539 540
20 36 14 17 17 7 17 14 10 22 5 6 9 4
541 542 543 544 545 546 548 549 550 551 553 554 555 556
4 13 14 18 17 12 7 12 14 18 6 16 8 16
557 558 559 560 561 562 563 564 565 566 567 568 569 570
14 12 23 5 6 5 19 13 20 14 32 22 14 4
571 572 573 574 575 576 577 578 579 580 581 582 583 584
22 4 33 27 26 11 22 6 18 14 4 7 8 12
585 586 587 588 589 590 591 592 593 594 595 596 597 598
12 12 10 30 11 18 26 21 3 18 8 20 11 9
599 600 601 602 603 604 605 606 607 608 609 610 611 612
16 14 14 3 22 6 18 13 5 10 25 14 22 5
613 614 615 616 617 618 619 620 621 622 623 624 625 626
2 12 33 3 33 14 5 6 11 6 11 16 3 13
627 628 629 630 631 632 633 634 635 636 637 638 639 640
23 4 6 7 18 7 13 10 18 7 19 7 5 9
641 642 643 644 645 646 647 648 649 650 651 652 653 654
15 17 2 10 34 1 17 7 17 24 8 10 10 6
655 656 657 658 659 660 661 663 665 667 668 669 670 671
5 6 9 23 10 5 5 7 5 6 37 21 6 4
672 673 674 675 676 677 678 679 680 681 682 683 685 686
12 3 14 7 5 9 14 16 6 7 30 53 8 5
687 688 689 690 691 692 693 694 695 696 697 698 699 700
5 7 21 5 19 7 10 10 10 16 24 6 20 7
701 702 703 704 705 706 707 708 709 710 711 712 713 714
4 10 19 7 17 4 8 3 4 30 7 2 8 13
715 716 717 718 719 720 721 722 723 725 726 728 729 730
34 8 23 10 11 9 4 15 7 2 24 2 6 24
731 732 733 734 735 736 737 738 739 740 741 742 743 744
6 2 2 6 7 1 11 7 4 13 9 7 1 17
745 746 747 748 749 750 751 752 753 754 755 756 757 758
15 18 22 29 19 26 5 5 2 1 16 9 4 23
759 760 761 762 763 764 765 766 767 768 769 770 771 772
10 7 16 3 5 22 29 26 1 21 9 17 3 20
773 774 775 776 777 778 779 780 781 782 783 784 785 786
13 16 18 21 14 6 34 33 22 11 1 3 15 11
787 788 789 790 791 792 793 794 795 796 797 798 799 800
3 1 9 16 24 17 4 10 3 56 20 15 9 8
802 803 804 805 806 807 808 809 810 811 812 813 814 815
29 2 22 7 20 19 30 35 15 7 19 7 10 5
816 817 818 819 820 821 822 823 824 825 826 827 828 829
27 24 25 21 12 5 11 18 12 2 3 29 14 15
830 831 832 833 834 835 836 837 838 839 840 841 842 843
12 11 3 31 11 30 9 31 6 5 10 17 10 4
844 845 846 847 848 849 850 851 852 853 854 855 856 857
25 6 15 17 13 32 9 19 5 34 8 8 12 17
859 860 861 862 863 864 865 866 867 868 869 870 871 872
1 20 38 7 6 8 20 19 7 2 6 4 2 1
873 874 875 876 877 878 879 880 881 882 883 884 885 886
4 16 19 9 2 9 1 4 7 14 14 10 6 20
888 889 890 891 892 893 894 895 896 897 898 899 900 901
19 13 1 17 4 9 7 7 7 13 15 9 7 10
902 903 904 905 906 907 908 909 910 911 912 914 915 916
10 35 9 6 8 2 8 10 18 2 5 7 3 5
917 918 919 920 921 922 923 924 925 926 927 928 929 930
32 11 12 10 26 4 5 8 4 22 18 4 2 6
932 934 935 936 937 938 940 941 942 943 945 946 947 948
10 13 1 30 1 8 2 10 3 2 4 1 1 10
949 950 951 952 955 957 959 960 961 963 964 965 966 967
4 20 8 6 1 2 7 12 5 11 18 1 16 6
968 969 971 972 973 974 975 977 979 980 981 982 984 985
4 2 21 2 16 5 4 34 14 6 3 9 6 1
986 988 991 992 993 995 996 997 998 999 1000 1001 1003 1005
3 21 10 1 8 7 10 1 21 10 2 3 1 16
1006 1007 1008 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1021
9 7 2 4 1 4 1 15 13 11 11 19 6 6
1022 1023 1024 1025 1026 1027 1028 1029 1031 1032 1033 1034 1035 1037
8 9 1 16 7 3 8 18 2 18 6 3 10 3
1038 1039 1040 1041 1042 1043 1044 1045 1046 1050 1051 1052 1053 1055
18 10 9 5 25 2 11 6 9 4 3 5 21 1
1056 1057 1058 1059 1060 1061 1064 1065 1066 1067 1071 1072 1076 1077
7 15 3 8 1 3 4 8 1 7 3 6 9 9
1079 1081 1082 1085 1086 1087 1088 1089 1091 1092 1094 1095 1097 1099
7 1 11 3 5 19 6 3 17 3 11 2 5 5
1102 1103 1104 1105 1107 1109 1110 1112 1113 1114 1116 1117 1118 1119
6 14 5 11 4 2 5 12 3 12 3 13 4 7
1120 1121 1123 1124 1128 1129 1130 1132 1133 1134 1135 1136 1137 1138
3 8 4 4 1 11 10 2 9 8 2 21 5 11
1139 1140 1142 1143 1144 1145 1146 1147 1148 1149 1150 1152 1153 1154
13 21 7 8 4 14 4 5 3 6 3 8 9 7
1155 1156 1157 1158 1159 1160 1161 1162 1164 1165 1168 1169 1170 1172
2 1 32 1 9 5 15 4 14 6 21 19 17 7
1173 1174 1175 1176 1177 1178 1180 1182 1183 1185 1186 1187 1188 1191
28 15 2 5 4 7 2 19 8 8 11 20 6 7
1194 1195 1196 1197 1198 1200 1201 1203 1204 1205 1206 1207 1209 1210
6 7 8 13 16 9 10 4 4 13 1 14 11 3
1214 1215 1216 1217 1218 1219 1220 1221 1222 1224 1225 1226 1227 1228
10 6 16 7 31 40 9 22 16 13 7 7 11 4
1230 1231 1232 1234 1235 1237 1238 1239 1240 1241 1242 1243 1245 1246
13 10 10 32 2 25 3 1 8 4 11 8 1 2
1249 1250 1251 1254 1256 1257 1259 1260 1261 1263 1264 1265 1266 1267
1 5 7 4 6 2 2 10 1 3 1 2 19 12
1268 1269 1270 1271 1274 1277 1278 1279 1280 1281 1282 1283 1284 1285
7 6 3 3 1 7 16 11 7 1 13 1 1 10
1286 1287 1289 1290 1291 1292 1296 1297 1298 1300 1301 1302 1303 1304
9 2 5 5 4 7 7 3 6 5 3 6 4 22
1305 1306 1307 1308 1309 1310 1311 1313 1315 1318 1319 1320 1321 1322
12 12 5 2 6 21 3 1 17 1 21 3 3 6
1324 1326 1328 1329 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340
12 1 9 1 10 11 6 9 23 13 3 1 3 9
1341 1342 1343 1344 1346 1347 1349 1351 1352 1355 1356 1357 1358 1362
13 22 18 5 1 22 6 6 3 8 23 6 2 27
1363 1364 1365 1367 1368 1370 1372 1373 1375 1376 1377 1378 1379 1380
5 2 1 7 12 12 7 6 11 4 11 6 19 3
1381 1383 1385 1387 1388 1390 1391 1392 1393 1395 1396 1397 1399 1400
2 16 1 7 10 24 5 3 7 5 17 6 5 9
1401 1402 1403 1404 1405 1408 1409 1412 1414 1417 1420 1422 1423 1427
9 9 11 3 1 6 5 5 5 3 3 3 2 1
1428 1429 1431 1434 1435 1439 1441 1442 1443 1446 1450 1451 1452 1453
1 1 11 1 1 2 2 4 17 6 2 1 1 8
1454 1458 1463 1464 1465 1466 1467 1469 1470 1473 1478 1479 1480 1483
11 3 1 3 2 5 2 1 7 2 1 3 9 1
1484 1486 1487 1488 1491 1492 1495 1496 1498 1509 1513 1517 1520 1526
2 7 5 1 7 2 1 20 1 2 3 3 2 6
1530 1531 1533 1537 1538 1539 1542 1546 1547 1552 1555 1557 1559 1563
4 12 1 3 2 3 1 2 7 4 2 3 2 8
1564 1567 1571 1577 1578 1582 1583 1587 1592 1594 1603 1608 1612 1614
7 1 2 4 2 8 3 3 6 2 9 1 1 2
1616 1620 1628 1630 1638 1644 1645 1669 1678 1687 1691 1700 1718 1728
2 2 2 16 1 3 4 1 1 1 1 5 1 1
1731 1735 1746 1748 1759 1762 1767 1768 1793 1799 1802 1817 1845 1868
3 5 3 2 3 4 4 1 4 5 4 1 3 3
1875 1913 1942 1956
1 2 5 4
3 least connected regions:
3083 4737 5544 with 2 links
4 most connected regions:
3250 8144 12521 13498 with 1956 links
Next, nb2listw() of spdep packge will be used to convert the output neighbours lists (i.e. nb) into a spatial weights.
nb_lw <- nb2listw(nb, style = 'W')
summary(nb_lw)
Characteristics of weights list object:
Neighbour list object:
Number of regions: 15853
Number of nonzero links: 10144322
Percentage nonzero weights: 4.036455
Average number of links: 639.8992
Link number distribution:
2 14 25 31 46 56 58 60 63 65 68 70 71 72
3 14 20 6 47 4 4 5 1 2 4 1 12 1
78 85 86 88 89 90 91 96 97 99 100 104 106 109
77 2 1 5 4 3 1 7 3 7 4 2 2 2
110 111 112 115 117 118 119 120 121 122 123 124 125 126
6 5 1 6 7 3 1 7 12 7 19 6 58 18
128 130 131 132 133 135 136 137 138 139 140 141 142 143
29 10 19 4 7 7 4 9 22 5 8 9 13 3
144 145 147 148 149 150 151 152 153 154 155 156 157 158
13 6 19 1 4 3 6 20 15 14 2 10 4 9
159 160 161 164 165 166 167 168 170 171 172 173 174 175
3 5 16 9 19 12 6 3 3 15 9 13 1 10
176 177 178 179 180 181 182 183 184 185 186 187 188 189
16 14 16 27 19 3 10 7 11 29 38 9 6 2
190 191 192 193 194 195 196 197 198 199 200 201 202 203
7 14 9 4 4 2 3 6 22 12 20 25 13 4
204 205 206 207 208 209 211 212 213 214 215 216 217 218
12 3 3 21 27 2 3 4 4 10 25 7 1 6
219 220 221 222 223 224 225 226 227 228 229 230 231 232
10 21 28 17 1 36 10 4 20 22 5 13 12 8
233 234 235 236 237 238 239 240 241 242 243 244 245 246
6 13 39 19 2 14 7 36 23 5 11 12 5 13
247 248 249 250 251 252 253 254 255 256 257 258 259 260
19 17 10 14 28 14 8 12 9 16 15 9 4 5
261 262 263 264 265 266 267 268 269 270 271 272 273 274
12 11 19 12 14 17 15 30 6 7 37 64 16 16
275 276 277 278 279 280 281 282 283 284 285 286 287 288
17 31 60 3 45 52 24 26 25 26 27 47 49 45
289 290 291 292 293 294 295 296 297 298 299 300 301 302
25 37 57 27 27 50 18 9 30 21 10 30 38 32
303 304 305 306 307 308 309 310 311 312 313 314 315 316
14 27 27 17 18 5 2 15 18 10 26 24 8 18
317 318 319 320 321 322 323 324 325 326 327 328 329 330
44 12 23 40 18 15 30 38 23 28 29 28 18 39
331 332 333 334 335 336 337 338 339 340 341 342 343 344
5 18 16 14 27 7 62 25 21 31 29 6 33 14
345 346 347 348 349 350 351 352 353 354 355 356 357 358
12 11 31 42 29 13 14 42 18 17 31 14 25 23
359 360 361 362 363 364 365 366 367 368 369 370 371 372
31 21 13 30 31 7 28 18 39 16 19 32 22 10
373 374 375 376 377 378 379 380 381 382 383 384 385 386
45 23 10 61 9 24 13 17 12 18 11 20 29 17
387 388 389 390 391 392 393 394 395 396 397 398 399 400
47 38 28 22 5 30 5 8 33 20 24 16 28 18
401 402 403 404 405 406 407 408 409 410 411 412 413 414
22 20 21 24 16 4 14 24 19 14 22 33 5 22
415 416 417 418 419 420 421 422 423 424 425 426 427 428
22 8 12 76 8 20 22 49 41 18 13 33 28 23
429 430 431 432 433 434 435 436 437 438 439 440 441 442
20 25 11 16 40 19 32 29 19 22 26 13 19 9
443 444 445 446 447 448 449 450 451 452 453 454 455 456
6 6 26 10 19 33 15 29 9 32 18 11 25 13
457 458 459 460 461 462 463 464 465 466 467 468 469 470
72 23 15 10 14 12 9 38 11 5 24 23 3 35
471 472 473 474 475 476 477 478 479 480 481 482 483 484
13 21 14 14 21 15 24 23 27 26 17 4 12 18
485 486 487 488 489 490 491 492 493 494 495 496 497 498
14 15 9 23 25 17 10 7 27 16 3 9 20 1
499 500 501 502 503 504 505 506 507 508 509 510 511 512
5 10 10 18 19 8 36 27 17 16 19 39 12 29
513 514 515 516 517 518 519 520 521 522 523 524 525 526
12 14 16 8 4 13 2 4 16 7 13 10 12 11
527 528 529 530 531 532 533 534 535 536 537 538 539 540
20 36 14 17 17 7 17 14 10 22 5 6 9 4
541 542 543 544 545 546 548 549 550 551 553 554 555 556
4 13 14 18 17 12 7 12 14 18 6 16 8 16
557 558 559 560 561 562 563 564 565 566 567 568 569 570
14 12 23 5 6 5 19 13 20 14 32 22 14 4
571 572 573 574 575 576 577 578 579 580 581 582 583 584
22 4 33 27 26 11 22 6 18 14 4 7 8 12
585 586 587 588 589 590 591 592 593 594 595 596 597 598
12 12 10 30 11 18 26 21 3 18 8 20 11 9
599 600 601 602 603 604 605 606 607 608 609 610 611 612
16 14 14 3 22 6 18 13 5 10 25 14 22 5
613 614 615 616 617 618 619 620 621 622 623 624 625 626
2 12 33 3 33 14 5 6 11 6 11 16 3 13
627 628 629 630 631 632 633 634 635 636 637 638 639 640
23 4 6 7 18 7 13 10 18 7 19 7 5 9
641 642 643 644 645 646 647 648 649 650 651 652 653 654
15 17 2 10 34 1 17 7 17 24 8 10 10 6
655 656 657 658 659 660 661 663 665 667 668 669 670 671
5 6 9 23 10 5 5 7 5 6 37 21 6 4
672 673 674 675 676 677 678 679 680 681 682 683 685 686
12 3 14 7 5 9 14 16 6 7 30 53 8 5
687 688 689 690 691 692 693 694 695 696 697 698 699 700
5 7 21 5 19 7 10 10 10 16 24 6 20 7
701 702 703 704 705 706 707 708 709 710 711 712 713 714
4 10 19 7 17 4 8 3 4 30 7 2 8 13
715 716 717 718 719 720 721 722 723 725 726 728 729 730
34 8 23 10 11 9 4 15 7 2 24 2 6 24
731 732 733 734 735 736 737 738 739 740 741 742 743 744
6 2 2 6 7 1 11 7 4 13 9 7 1 17
745 746 747 748 749 750 751 752 753 754 755 756 757 758
15 18 22 29 19 26 5 5 2 1 16 9 4 23
759 760 761 762 763 764 765 766 767 768 769 770 771 772
10 7 16 3 5 22 29 26 1 21 9 17 3 20
773 774 775 776 777 778 779 780 781 782 783 784 785 786
13 16 18 21 14 6 34 33 22 11 1 3 15 11
787 788 789 790 791 792 793 794 795 796 797 798 799 800
3 1 9 16 24 17 4 10 3 56 20 15 9 8
802 803 804 805 806 807 808 809 810 811 812 813 814 815
29 2 22 7 20 19 30 35 15 7 19 7 10 5
816 817 818 819 820 821 822 823 824 825 826 827 828 829
27 24 25 21 12 5 11 18 12 2 3 29 14 15
830 831 832 833 834 835 836 837 838 839 840 841 842 843
12 11 3 31 11 30 9 31 6 5 10 17 10 4
844 845 846 847 848 849 850 851 852 853 854 855 856 857
25 6 15 17 13 32 9 19 5 34 8 8 12 17
859 860 861 862 863 864 865 866 867 868 869 870 871 872
1 20 38 7 6 8 20 19 7 2 6 4 2 1
873 874 875 876 877 878 879 880 881 882 883 884 885 886
4 16 19 9 2 9 1 4 7 14 14 10 6 20
888 889 890 891 892 893 894 895 896 897 898 899 900 901
19 13 1 17 4 9 7 7 7 13 15 9 7 10
902 903 904 905 906 907 908 909 910 911 912 914 915 916
10 35 9 6 8 2 8 10 18 2 5 7 3 5
917 918 919 920 921 922 923 924 925 926 927 928 929 930
32 11 12 10 26 4 5 8 4 22 18 4 2 6
932 934 935 936 937 938 940 941 942 943 945 946 947 948
10 13 1 30 1 8 2 10 3 2 4 1 1 10
949 950 951 952 955 957 959 960 961 963 964 965 966 967
4 20 8 6 1 2 7 12 5 11 18 1 16 6
968 969 971 972 973 974 975 977 979 980 981 982 984 985
4 2 21 2 16 5 4 34 14 6 3 9 6 1
986 988 991 992 993 995 996 997 998 999 1000 1001 1003 1005
3 21 10 1 8 7 10 1 21 10 2 3 1 16
1006 1007 1008 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1021
9 7 2 4 1 4 1 15 13 11 11 19 6 6
1022 1023 1024 1025 1026 1027 1028 1029 1031 1032 1033 1034 1035 1037
8 9 1 16 7 3 8 18 2 18 6 3 10 3
1038 1039 1040 1041 1042 1043 1044 1045 1046 1050 1051 1052 1053 1055
18 10 9 5 25 2 11 6 9 4 3 5 21 1
1056 1057 1058 1059 1060 1061 1064 1065 1066 1067 1071 1072 1076 1077
7 15 3 8 1 3 4 8 1 7 3 6 9 9
1079 1081 1082 1085 1086 1087 1088 1089 1091 1092 1094 1095 1097 1099
7 1 11 3 5 19 6 3 17 3 11 2 5 5
1102 1103 1104 1105 1107 1109 1110 1112 1113 1114 1116 1117 1118 1119
6 14 5 11 4 2 5 12 3 12 3 13 4 7
1120 1121 1123 1124 1128 1129 1130 1132 1133 1134 1135 1136 1137 1138
3 8 4 4 1 11 10 2 9 8 2 21 5 11
1139 1140 1142 1143 1144 1145 1146 1147 1148 1149 1150 1152 1153 1154
13 21 7 8 4 14 4 5 3 6 3 8 9 7
1155 1156 1157 1158 1159 1160 1161 1162 1164 1165 1168 1169 1170 1172
2 1 32 1 9 5 15 4 14 6 21 19 17 7
1173 1174 1175 1176 1177 1178 1180 1182 1183 1185 1186 1187 1188 1191
28 15 2 5 4 7 2 19 8 8 11 20 6 7
1194 1195 1196 1197 1198 1200 1201 1203 1204 1205 1206 1207 1209 1210
6 7 8 13 16 9 10 4 4 13 1 14 11 3
1214 1215 1216 1217 1218 1219 1220 1221 1222 1224 1225 1226 1227 1228
10 6 16 7 31 40 9 22 16 13 7 7 11 4
1230 1231 1232 1234 1235 1237 1238 1239 1240 1241 1242 1243 1245 1246
13 10 10 32 2 25 3 1 8 4 11 8 1 2
1249 1250 1251 1254 1256 1257 1259 1260 1261 1263 1264 1265 1266 1267
1 5 7 4 6 2 2 10 1 3 1 2 19 12
1268 1269 1270 1271 1274 1277 1278 1279 1280 1281 1282 1283 1284 1285
7 6 3 3 1 7 16 11 7 1 13 1 1 10
1286 1287 1289 1290 1291 1292 1296 1297 1298 1300 1301 1302 1303 1304
9 2 5 5 4 7 7 3 6 5 3 6 4 22
1305 1306 1307 1308 1309 1310 1311 1313 1315 1318 1319 1320 1321 1322
12 12 5 2 6 21 3 1 17 1 21 3 3 6
1324 1326 1328 1329 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340
12 1 9 1 10 11 6 9 23 13 3 1 3 9
1341 1342 1343 1344 1346 1347 1349 1351 1352 1355 1356 1357 1358 1362
13 22 18 5 1 22 6 6 3 8 23 6 2 27
1363 1364 1365 1367 1368 1370 1372 1373 1375 1376 1377 1378 1379 1380
5 2 1 7 12 12 7 6 11 4 11 6 19 3
1381 1383 1385 1387 1388 1390 1391 1392 1393 1395 1396 1397 1399 1400
2 16 1 7 10 24 5 3 7 5 17 6 5 9
1401 1402 1403 1404 1405 1408 1409 1412 1414 1417 1420 1422 1423 1427
9 9 11 3 1 6 5 5 5 3 3 3 2 1
1428 1429 1431 1434 1435 1439 1441 1442 1443 1446 1450 1451 1452 1453
1 1 11 1 1 2 2 4 17 6 2 1 1 8
1454 1458 1463 1464 1465 1466 1467 1469 1470 1473 1478 1479 1480 1483
11 3 1 3 2 5 2 1 7 2 1 3 9 1
1484 1486 1487 1488 1491 1492 1495 1496 1498 1509 1513 1517 1520 1526
2 7 5 1 7 2 1 20 1 2 3 3 2 6
1530 1531 1533 1537 1538 1539 1542 1546 1547 1552 1555 1557 1559 1563
4 12 1 3 2 3 1 2 7 4 2 3 2 8
1564 1567 1571 1577 1578 1582 1583 1587 1592 1594 1603 1608 1612 1614
7 1 2 4 2 8 3 3 6 2 9 1 1 2
1616 1620 1628 1630 1638 1644 1645 1669 1678 1687 1691 1700 1718 1728
2 2 2 16 1 3 4 1 1 1 1 5 1 1
1731 1735 1746 1748 1759 1762 1767 1768 1793 1799 1802 1817 1845 1868
3 5 3 2 3 4 4 1 4 5 4 1 3 3
1875 1913 1942 1956
1 2 5 4
3 least connected regions:
3083 4737 5544 with 2 links
4 most connected regions:
3250 8144 12521 13498 with 1956 links
Weights style: W
Weights constants summary:
n nn S0 S1 S2
W 15853 251317609 15853 79.46042 63911.22
Next, lm.morantest() of spdep package will be used to perform Moran’s I test for residual spatial autocorrelation
lm.morantest(flat_mlr1, nb_lw)
Global Moran I for regression residuals
data:
model: lm(formula = PRICE ~ AREA_SQM + LEASE_YRS + X01.TO.03 +
X04.TO.06 + X07.TO.09 + X10.TO.12 + X13.TO.15 + X25.TO.27 +
X28.TO.30 + X31.TO.33 + X34.TO.36 + X37.TO.39 + X40.TO.42 +
X43.TO.45 + X46.TO.48 + X49.TO.51 + PROX_EC + PROX_HA +
PROX_MRT + PROX_GDPRI + PROX_MALL + PROX_SUP + NUM_KIN +
NUM_CC + NUM_STOP + NUM_PRI, data = flat_resale_sf)
weights: nb_lw
Moran I statistic standard deviate = 921.57, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Observed Moran I Expectation Variance
4.713397e-01 -3.669543e-04 2.619936e-07
The Global Moran’s I test for residual spatial autocorrelation shows that its p-value is less than the alpha value of 0.05. We will thus reject the null hypothesis that the residuals are randomly distributed.
Since the Observed Global Moran I = 0.471 which is greater than 0, we can infer than the residuals resemble cluster distribution.
We will calibrate the GWR-based hedonic pricing model using the adaptive bandwidth approach.
We first use bw.gwr() to determine the recommended data point to use.
bw_adaptive <- bw.gwr(formula = PRICE ~ AREA_SQM + LEASE_YRS + PROX_EC +
PROX_HA + PROX_MRT + PROX_GDPRI + PROX_MALL + PROX_SUP +
NUM_KIN + NUM_CC + NUM_STOP + NUM_PRI,
data=flat_resale_sp, approach="CV", kernel="gaussian",
adaptive=TRUE, longlat=FALSE)
Take a cup of tea and have a break, it will take a few minutes.
-----A kind suggestion from GWmodel development group
Adaptive bandwidth: 9805 CV score: 9.263168e+13
Adaptive bandwidth: 6068 CV score: 8.315779e+13
Adaptive bandwidth: 3757 CV score: 7.260491e+13
Adaptive bandwidth: 2330 CV score: 6.088141e+13
Adaptive bandwidth: 1446 CV score: 4.445911e+13
Adaptive bandwidth: 902 CV score: 3.45399e+13
Adaptive bandwidth: 563 CV score: 2.884093e+13
Adaptive bandwidth: 356 CV score: 2.453133e+13
Adaptive bandwidth: 225 CV score: 2.114838e+13
Adaptive bandwidth: 147 CV score: 1.881623e+13
Adaptive bandwidth: 96 CV score: 1.701977e+13
Adaptive bandwidth: 67 CV score: 1.596888e+13
Adaptive bandwidth: 46 CV score: Inf
Adaptive bandwidth: 76 CV score: 1.623633e+13
Adaptive bandwidth: 57 CV score: 1.564312e+13
Adaptive bandwidth: 55 CV score: 1.558171e+13
Adaptive bandwidth: 49 CV score: Inf
Adaptive bandwidth: 53 CV score: 1.543224e+13
Adaptive bandwidth: 58 CV score: 1.56816e+13
Adaptive bandwidth: 56 CV score: 1.562018e+13
Adaptive bandwidth: 57 CV score: 1.564312e+13
Adaptive bandwidth: 56 CV score: 1.562018e+13
Adaptive bandwidth: 56 CV score: 1.562018e+13
Adaptive bandwidth: 55 CV score: 1.558171e+13
Adaptive bandwidth: 55 CV score: 1.558171e+13
Adaptive bandwidth: 54 CV score: 1.553114e+13
Adaptive bandwidth: 54 CV score: 1.553114e+13
Adaptive bandwidth: 53 CV score: 1.543224e+13
The result shows that the recommended data point is 57. We will then pass this value in while computing the adaptive bandwidth GWR model.
We can now go ahead to calibrate the GWR-based hedonic pricing model using the adaptive bandwidth and gaussian kernel.
gwr_adaptive <- gwr.basic(formula = PRICE ~ AREA_SQM + LEASE_YRS + PROX_EC +
PROX_HA + PROX_MRT + PROX_GDPRI + PROX_MALL + PROX_SUP +
NUM_KIN + NUM_CC + NUM_STOP + NUM_PRI,
data=flat_resale_sp, bw=bw_adaptive,
kernel = 'gaussian', adaptive=TRUE, longlat = FALSE)
gwr_adaptive
***********************************************************************
* Package GWmodel *
***********************************************************************
Program starts at: 2021-11-06 00:42:43
Call:
gwr.basic(formula = PRICE ~ AREA_SQM + LEASE_YRS + PROX_EC +
PROX_HA + PROX_MRT + PROX_GDPRI + PROX_MALL + PROX_SUP +
NUM_KIN + NUM_CC + NUM_STOP + NUM_PRI, data = flat_resale_sp,
bw = bw_adaptive, kernel = "gaussian", adaptive = TRUE, longlat = FALSE)
Dependent (y) variable: PRICE
Independent variables: AREA_SQM LEASE_YRS PROX_EC PROX_HA PROX_MRT PROX_GDPRI PROX_MALL PROX_SUP NUM_KIN NUM_CC NUM_STOP NUM_PRI
Number of data points: 15853
***********************************************************************
* Results of Global Regression *
***********************************************************************
Call:
lm(formula = formula, data = data)
Residuals:
Min 1Q Median 3Q Max
-239991 -48552 -9613 39391 603257
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.742e+05 1.127e+04 24.334 <2e-16 ***
AREA_SQM 1.466e+03 9.613e+01 15.248 <2e-16 ***
LEASE_YRS 4.105e+03 5.623e+01 73.004 <2e-16 ***
PROX_EC -3.248e+01 1.036e+00 -31.362 <2e-16 ***
PROX_HA -5.108e+01 1.326e+00 -38.528 <2e-16 ***
PROX_MRT -5.963e+01 1.815e+00 -32.848 <2e-16 ***
PROX_GDPRI -1.254e+01 2.455e-01 -51.077 <2e-16 ***
PROX_MALL -4.594e+01 1.984e+00 -23.151 <2e-16 ***
PROX_SUP -3.998e+01 4.426e+00 -9.031 <2e-16 ***
NUM_KIN 1.159e+04 6.791e+02 17.060 <2e-16 ***
NUM_CC -5.590e+03 3.771e+02 -14.822 <2e-16 ***
NUM_STOP -2.153e+03 2.377e+02 -9.060 <2e-16 ***
NUM_PRI -2.448e+04 4.921e+02 -49.736 <2e-16 ***
---Significance stars
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 81800 on 15840 degrees of freedom
Multiple R-squared: 0.5372
Adjusted R-squared: 0.5369
F-statistic: 1532 on 12 and 15840 DF, p-value: < 2.2e-16
***Extra Diagnostic information
Residual sum of squares: 1.059802e+14
Sigma(hat): 81768.11
AIC: 403661.8
AICc: 403661.8
BIC: 388051.6
***********************************************************************
* Results of Geographically Weighted Regression *
***********************************************************************
*********************Model calibration information*********************
Kernel function: gaussian
Adaptive bandwidth: 53 (number of nearest neighbours)
Regression points: the same locations as observations are used.
Distance metric: Euclidean distance metric is used.
****************Summary of GWR coefficient estimates:******************
Min. 1st Qu. Median 3rd Qu.
Intercept -1.0236e+08 -3.7350e+05 -1.1166e+05 1.7093e+05
AREA_SQM -1.1656e+04 1.4608e+03 2.2916e+03 3.8652e+03
LEASE_YRS -2.1344e+04 3.2116e+03 5.1528e+03 7.2816e+03
PROX_EC -1.7900e+04 -5.3473e+01 6.0407e+00 5.2218e+01
PROX_HA -9.4910e+04 -4.7023e+01 2.5105e+00 7.1890e+01
PROX_MRT -7.0885e+04 -1.2639e+02 -5.7240e+01 3.7897e+00
PROX_GDPRI -1.5787e+04 -6.0603e+01 -6.7086e+00 4.4177e+01
PROX_MALL -7.0324e+04 -9.0740e+01 -2.4896e+01 4.3880e+01
PROX_SUP -3.5381e+03 -5.8118e+01 -1.1337e+01 4.6861e+01
NUM_KIN -5.0136e+06 -7.4942e+03 -6.6209e+02 4.8968e+03
NUM_CC -1.2328e+05 -2.4903e+03 2.3985e+02 3.2078e+03
NUM_STOP -3.5763e+04 -1.9891e+03 4.1286e+01 2.1538e+03
NUM_PRI -6.5536e+05 -8.1013e+03 -1.2313e+03 4.2885e+03
Max.
Intercept 129458402
AREA_SQM 224871
LEASE_YRS 32079
PROX_EC 77518
PROX_HA 81068
PROX_MRT 33093
PROX_GDPRI 18186
PROX_MALL 45254
PROX_SUP 45269
NUM_KIN 257280
NUM_CC 201473
NUM_STOP 124415
NUM_PRI 303858
************************Diagnostic information*************************
Number of data points: 15853
Effective number of parameters (2trace(S) - trace(S'S)): 1673.642
Effective degrees of freedom (n-2trace(S) + trace(S'S)): 14179.36
AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 372896.5
AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 371285.5
BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 367182.2
Residual sum of squares: 1.264499e+13
R-square value: 0.9447844
Adjusted R-square value: 0.9382667
***********************************************************************
Program stops at: 2021-11-06 00:52:54
The report shows that the adjusted r-square of the GWR is 0.9382667 which is significantly better than the global multiple linear regression model of 0.5369.
In addition to regression residuals, the output feature class table includes:
They are all stored in a SpatialPointsDataFrame or SpatialPolygonsDataFrame object integrated with fit.points, GWR coefficient estimates, y value, predicted values, coefficient standard errors and t-values in its “data” slot in an object called SDF of the output list.
To visualise the fields in SDF, we need to first covert it into sf data frame.
flat_resale_sf_adaptive <- st_as_sf(gwr_adaptive$SDF) %>%
st_transform(crs=3414)
gwr_adaptive_output <- as.data.frame(gwr_adaptive$SDF)
flat_resale_sf_adaptive <- cbind(flat_resale_res_sf,
as.matrix(gwr_adaptive_output))
glimpse(flat_resale_sf_adaptive)
Rows: 15,853
Columns: 82
$ PRICE <dbl> 330000, 360000, 370000, 375000, 380000, 380000~
$ LOG_PRICE <dbl> 12.70685, 12.79386, 12.82126, 12.83468, 12.847~
$ AREA_SQM <dbl> 92, 91, 92, 99, 92, 92, 92, 92, 93, 91, 91, 98~
$ LEASE_YRS <dbl> 57.00, 61.50, 61.08, 58.33, 59.58, 61.00, 58.8~
$ X01.TO.03 <dbl> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X07.TO.09 <dbl> 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0~
$ X04.TO.06 <dbl> 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0~
$ X10.TO.12 <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0~
$ X13.TO.15 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1~
$ X25.TO.27 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X19.TO.21 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X16.TO.18 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X28.TO.30 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X31.TO.33 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X22.TO.24 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X37.TO.39 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X34.TO.36 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X40.TO.42 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X43.TO.45 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X46.TO.48 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X49.TO.51 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ PROX_CBD <dbl> 8824.749, 9841.309, 9560.780, 9609.731, 8351.0~
$ PROX_EC <dbl> 251.4065, 631.8448, 1082.4168, 347.3195, 195.8~
$ PROX_HA <dbl> 441.82653, 269.72560, 258.29513, 436.47518, 70~
$ PROX_MRT <dbl> 703.9715, 1096.9096, 889.9529, 1409.3169, 886.~
$ PROX_PARK <dbl> 730.3122, 645.3451, 743.7891, 808.8585, 951.45~
$ PROX_GDPRI <dbl> 1426.311, 1735.175, 2069.003, 1431.997, 2409.0~
$ PROX_MALL <dbl> 546.0414, 1452.7304, 1030.7976, 1514.3816, 994~
$ PROX_SUP <dbl> 270.8222, 310.1889, 318.7560, 458.6748, 337.98~
$ NUM_KIN <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 2~
$ NUM_CC <dbl> 6, 5, 2, 3, 3, 2, 3, 4, 3, 2, 4, 4, 4, 5, 2, 7~
$ NUM_STOP <dbl> 8, 8, 8, 7, 6, 9, 6, 6, 5, 4, 10, 5, 6, 9, 8, ~
$ NUM_PRI <dbl> 2, 2, 1, 2, 2, 1, 3, 2, 2, 2, 2, 2, 3, 2, 2, 3~
$ MLR_RES <dbl> -65028.840, -25545.406, -26560.907, 7843.605, ~
$ Intercept <dbl> -350761.80, -55327.20, -227355.59, -46943.06, ~
$ AREA_SQM.1 <dbl> 3865.1772, 1737.2207, 2625.1875, 1691.6949, 40~
$ LEASE_YRS.1 <dbl> 6611.712, 6424.919, 10006.102, 6298.735, 8284.~
$ PROX_EC.1 <dbl> -43.8754894, -2.8562872, -61.0260887, -0.88612~
$ PROX_HA.1 <dbl> -36.3592938, 37.0574321, -36.5263396, 87.79895~
$ PROX_MRT.1 <dbl> -168.34579, -16.77467, -90.13781, -41.06379, -~
$ PROX_GDPRI.1 <dbl> 34.898286, -27.915369, -24.089908, -44.094387,~
$ PROX_MALL.1 <dbl> 115.915461, -47.616086, -18.726036, -24.432638~
$ PROX_SUP.1 <dbl> 73.043248, 89.510041, -41.240030, 60.707395, -~
$ NUM_KIN.1 <dbl> 1575.8238, 12980.2395, 2033.8891, 9039.6359, -~
$ NUM_CC.1 <dbl> -2088.0613, 2556.1607, -2730.4295, 2805.3276, ~
$ NUM_STOP.1 <dbl> 7683.8487, -308.4065, 2569.5484, 633.8828, 549~
$ NUM_PRI.1 <dbl> -6256.462, -21381.637, -20925.171, -14726.568,~
$ y <dbl> 330000, 360000, 370000, 375000, 380000, 380000~
$ yhat <dbl> 406953.8, 378366.9, 383539.6, 388223.6, 373838~
$ residual <dbl> -76953.8297, -18366.8767, -13539.6453, -13223.~
$ CV_Score <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ Stud_residual <dbl> -2.79850481, -0.63629402, -0.47335355, -0.4649~
$ Intercept_SE <dbl> 86015.65, 57512.72, 134655.97, 55066.51, 15457~
$ AREA_SQM_SE <dbl> 626.1058, 488.0151, 1123.0209, 454.2593, 1204.~
$ LEASE_YRS_SE <dbl> 215.2578, 252.7942, 295.7688, 208.5840, 367.59~
$ PROX_EC_SE <dbl> 41.16518, 17.02008, 19.19507, 14.00328, 23.571~
$ PROX_HA_SE <dbl> 33.139552, 23.558663, 37.946592, 21.782462, 38~
$ PROX_MRT_SE <dbl> 72.61696, 26.28474, 29.59957, 26.01910, 42.299~
$ PROX_GDPRI_SE <dbl> 30.225445, 17.937704, 18.894105, 16.530167, 22~
$ PROX_MALL_SE <dbl> 71.316385, 18.273429, 29.884950, 19.572021, 43~
$ PROX_SUP_SE <dbl> 46.57929, 27.54419, 29.04538, 25.79186, 29.471~
$ NUM_KIN_SE <dbl> 6312.955, 5186.056, 7882.663, 4705.616, 6429.2~
$ NUM_CC_SE <dbl> 3376.605, 1758.835, 3540.494, 1769.266, 4205.2~
$ NUM_STOP_SE <dbl> 1906.4874, 1646.1514, 1989.5683, 1570.5296, 27~
$ NUM_PRI_SE <dbl> 7858.392, 6023.790, 7674.683, 5301.985, 7121.3~
$ Intercept_TV <dbl> -4.0778837, -0.9619993, -1.6884182, -0.8524793~
$ AREA_SQM_TV <dbl> 6.1733609, 3.5597682, 2.3376124, 3.7240730, 3.~
$ LEASE_YRS_TV <dbl> 30.715324, 25.415609, 33.830822, 30.197599, 22~
$ PROX_EC_TV <dbl> -1.06583991, -0.16781867, -3.17925848, -0.0632~
$ PROX_HA_TV <dbl> -1.09715707, 1.57298537, -0.96257233, 4.030717~
$ PROX_MRT_TV <dbl> -2.3182709, -0.6381906, -3.0452410, -1.5782171~
$ PROX_GDPRI_TV <dbl> 1.1545996, -1.5562398, -1.2749960, -2.6675100,~
$ PROX_MALL_TV <dbl> 1.6253693, -2.6057553, -0.6266042, -1.2483452,~
$ PROX_SUP_TV <dbl> 1.56814872, 3.24968852, -1.41984837, 2.3537422~
$ NUM_KIN_TV <dbl> 0.24961745, 2.50291168, 0.25802055, 1.92103155~
$ NUM_CC_TV <dbl> -0.6183908, 1.4533264, -0.7712001, 1.5855880, ~
$ NUM_STOP_TV <dbl> 4.0303694, -0.1873500, 1.2915105, 0.4036109, 1~
$ NUM_PRI_TV <dbl> -0.7961504, -3.5495321, -2.7265193, -2.7775575~
$ Local_R2 <dbl> 0.8533566, 0.7946223, 0.9666896, 0.8001496, 0.~
$ coords.x1 <dbl> 29179.92, 28423.42, 30550.38, 28240.06, 30443.~
$ coords.x2 <dbl> 38822.08, 39745.94, 39588.77, 39477.60, 38382.~
$ geometry <POINT [m]> POINT (29179.92 38822.08), POINT (28423.~
tmap_mode("view")
tm_shape(mpsz_sf)+
tm_polygons(alpha = 0.1) +
tm_shape(flat_resale_sf_adaptive) +
tm_dots(col = "Local_R2",
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(11,14))
We can observe that majority of the HDB flats have Local R-squared values in the range from 0.6 to 1. This implies that most of the HDB Resale Prices are explained by the GWR Model.
To recap, we used the following factors to build the hedonic pricing model:
From our study, we found that the majority of these factors affect the resale prices of 4-room flats with a transaction period from 01-Jan-2019 to 30-Sep-2020. Only the factor PROX_CBD is not used in model building since it has high correlation with another factor GD_PRISCH.
onemapsgapi Mall-Coordinates-Web-Scraper Geocoding Singapore Coordinates: OneMap API OneMap API Docs A Comprehensive Guide to Singapore’s SAP Schools Gifted Education Programme (GEP) Schools in Singapore: Which One to Choose?